Learning to Parse from a Treebank: Combining TBL and ILP

نویسنده

  • Miloslav Nepil
چکیده

Parsing a natural languagewith its substantial structural complexity and ambiguity has turned out to be a puzzler. While the most of attempts in this area so far has relied on hand-generated parsers, difficulties inherent in the manual construction of natural language grammar lead up to efforts to induce the grammar automatically. Our approach to the automatic grammar induction presented in this paper has resulted in design and implementation of the system GRIND (Grammar Induction), which is capable to learn a sequence of context-dependent parse actions from a given corpus of labelled derivation trees. To this end, GRIND combines two established methods of machine learning: transformation-based learning (TBL) and inductive logic programming (ILP). Being trained and tested on corpus SUSANNE, GRIND reached the accuracy of 96% and the recall of 68%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Parser Construction from a Treebank by means of TBL and ILP

Considering the difficulties inherent in the manual construction of natural language parsers, we have designed and implemented our system GRIND which is capable of learning a sequence of context-dependent parsing actions from an arbitrary corpus containing labelled parse trees. Being trained and tested on corpus SUSANNE, GRIND reaches the accuracy of 96 % and the recall of 68 %.

متن کامل

Inducing Deterministic Prolog Parsers from Treebanks: A Machine Learning Approach

or untagged treebanks. ’ When trained on an untagged This paper presents a method for constructing deterministic Prolog parsers from corpora of parsed sentences. Our approach uses recent machine learning methods for inducing Prolog rules from examples (inductive logic programming). We discuss several advantages of this method compared to recent statistical methods and present results on learnin...

متن کامل

Combining LAPIS and WordNet for Learning of LR Parsers with Optimal Semantic Constraints

There is a history of research focussed on learning of shift-reduce parsers from syntactically annotated corpora by the means of machine learning techniques based on logic. The presence of lexical semantic tags in the treebank has proved useful for the learning of semantic constraints limiting the amount of nondeterminism in the parsers. The grain of the semantic tags used is of direct importan...

متن کامل

Unsupervised Parse Selection for HPSG

Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as t...

متن کامل

Treeblazing: Using External Treebanks to Filter Parse Forests for Parse Selection and Treebanking

We describe “treeblazing”, a method of using annotations from the GENIA treebank to constrain a parse forest from an HPSG parser. Combining this with self-training, we show significant dependency score improvements in a task of adaptation to the biomedical domain, reducing error rate by 9% compared to out-of-domain gold data and 6% compared to self-training. We also demonstrate improvements in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001